Pilot experiment

Descriptives

Task description

Each experiment contains 6 trials. Each trial contains 6 training slides (non-ambiguous utterance) and 1 test slide (ambiguous utterance), as shown below.

Training slide Test slide

The items appearing in a trial belong to a set of 3 categories of items (for e.g. musical instruments, fruits, vehicles) that is unique across 6 trials. We run the experiment with distributions (6-0-0), (4-2-0), and (2-2-2), with each number corresponding to how many times a category has items appear in the trial.

Sample size

We remove responses that get less than 5/6 correct in training slides. If there are duplicated IPs, we only take the first response.

n
19

Finding

There is a graded response in distribution (4-2-0) corresponding to the frequency of appearance of each category. This fits with our hypothesis that people use common ground to resolve anaphora.

However, this result could also be driven by a recency effect where people simply choose according to the category of the last seen item.